0. STUDY CONTEXT AND OBJECTIVES

Tile: Exploring Leisure Activity Patterns among Spanish University Students through Network Analysis

Authors:
Diego Diaz-Milanes1,2,*

Affiliation:

1 Department of Quantitative Methods, Universidad Loyola Andalucía, Sevilla, Spain

2 Health Research Institute, University of Canberra, Canberra, ACT, Australia

Corresponding Authors:

This R Markdown document includes the data and R code associated with the present study, with the aim of ensuring full transparency and replicability of the data analysis.

The analysis is structured into six main steps. First, the dataset was loaded and preprocessed. Second, descriptive analyses were performed to examine the sociodemographic characteristics of the participants, the distribution of the items related to leisure, and potential gender differences.

Third, network analysis was conducted on the items. Fourth, tests of network invariance were applied to compare structural differences across gender groups. Fifth, a Bayesian network model was estimated. Finally, the predictive capacity of the different models was evaluated.

1. DATA LOADING & PROCESSING

In this section, the packages, function design, and data used are listed. Then, data processing—including variable selection and filtering by units of analysis—was performed.

1.1. Import Packages

A total of 17 packages were loaded:

library(readxl)
library(tidyr)
library(dplyr)
library(psych)
library(ggplot2)
library(semTools)
library(effectsize)
library(EGAnet)
library(qgraph)
library(NetworkComparisonTest)
library(bootnet)
library(mgm)
library(bnlearn)
library(caret)
library(forcats)
library(knitr)
library(kableExtra)

x <- htmltools::HTML("%<%")

# Define the packages and functions (with inline code formatting)
packages_info <- data.frame(
  Package = c(
    "readxl", "tidyr", "dplyr", "psych", "semTools", "ggplot2", "effectsize",
    "EGAnet", "qgraph", "NetworkComparisonTest", "bootnet", "mgm", "bnlearn",
    "caret", "forcats", "knitr", "kableExtra"
  ),
  Functions = c(
    "`read_excel`",
    paste0(x,", reshaping support"),
    paste0("`select`, `mutate`, `filter`, `group_by`, `summarise`, `bind_rows`, ",x),
    "`describe`, `polychoric`",
    "`mardiaSkew`, `mardiaKurtosis`",
    "`ggplot`, `aes`, `geom_line`, `labs`, `theme_bw`, etc.",
    "`cohens_d`",
    "`bootEGA`, `dimensionStability`",
    "`qgraph`",
    "`NCT`, `plot`",
    "`estimateNetwork`, `bootnet`, `corStability`",
    "`mgm`, `predict`",
    "`boot.strength`, `averaged.network`, `bn.fit`, `predict`",
    "`confusionMatrix`",
    "`fct_reorder`",
    "`kable`",
    "`kable_styling`, `add_header_above`"
  ),
  stringsAsFactors = FALSE
)

# Get package versions
packages_info$Version <- sapply(packages_info$Package, function(pkg) {
  as.character(packageVersion(pkg))
})

# Reorder columns
packages_info <- packages_info[, c("Package", "Version", "Functions")]

# Display as a nicely styled table
knitr::kable(packages_info, caption = "Table 1. Packages Used, Versions, and Functions") %>%
  kableExtra::kable_styling(full_width = FALSE, position = "center")
Table 1. Packages Used, Versions, and Functions
Package Version Functions
readxl 1.4.3 read_excel
tidyr 1.3.1 %<%, reshaping support
dplyr 1.1.4 select, mutate, filter, group_by, summarise, bind_rows, %<%
psych 2.4.1 describe, polychoric
semTools 0.5.6 mardiaSkew, mardiaKurtosis
ggplot2 3.5.2 ggplot, aes, geom_line, labs, theme_bw, etc.
effectsize 0.8.9 cohens_d
EGAnet 2.0.4 bootEGA, dimensionStability
qgraph 1.9.8 qgraph
NetworkComparisonTest 2.2.2 NCT, plot
bootnet 1.5.6 estimateNetwork, bootnet, corStability
mgm 1.2.14 mgm, predict
bnlearn 4.9 boot.strength, averaged.network, bn.fit, predict
caret 6.0.94 confusionMatrix
forcats 1.0.0 fct_reorder
knitr 1.45 kable
kableExtra 1.4.0 kable_styling, add_header_above

Design and generation of a formula for plotting centrality measures of two networks.

compareCentrality <- function(net1, net2,
                              include = c("Strength",
                                          "Closeness",
                                          "Betweenness",
                                          "ExpectedInfluence",
                                          "all",
                                          "All"),
                              orderBy = c("Strength",
                                          "Closeness",
                                          "Betweenness",
                                          "ExpectedInfluence"),
                              decreasing = T,
                              legendName = '',
                              net1Name = 'Network 1',
                              net2Name = 'Network 2'){
  
  if(include == "All" | include == "all"){
    include = c("Strength",
                "Closeness",
                "Betweenness",
                "ExpectedInfluence")
  }
  
  df <- centralityTable(net1, net2) %>% filter(measure %in% include)
  
  df %>% 
    mutate(graph = case_when(graph == 'graph 1' ~ net1Name,
                             graph == 'graph 2' ~ net2Name),
           graph = as.factor(graph),
           node = as.factor(node)) %>% 
    
    mutate(node = fct_reorder(node, value)) %>% 
    
    ggplot(aes(x = node, y = value, group = graph)) +
    
    geom_line(aes(linetype = graph), size = 1) +
    
    labs(x = '', y = '') +
    
    scale_linetype_discrete(name = legendName) +
    
    coord_flip() +
    
    facet_grid(~measure) +
    
    theme_bw()
  
}

1.2. Data Loading

The dataset from the Health Behavior in University (HBU) project (UHU-6272020) was loaded, including only the minimum set of variables required for the present study. The dataset can be found in the following link: https://github.com/diediamil/HBU_Leisure_Network_Analysis/blob/main/HBU_INJUVE_Database.xlsx

hbu <- read_excel("HBU_INJUVE_Database.xlsx")
hbu

1.3. Filtering

Filters were applied to remove participants who did not complete the informed consent (informed_consent), were minors or outside the typical age range for Spanish university students (from 18 to 26 years old), and to exclude cases with missing values:

initial_rows <- nrow(hbu)
hbu <- hbu[hbu$informed_consent == "Valid",] 
hbu <- hbu[!is.na(hbu$gender),]
hbu <- hbu[!is.na(hbu$age),]
hbu <- hbu[hbu$age >= 18 & hbu$age <= 26,]
hbu <- hbu %>%
  filter(if_all(INJU1:INJU17, ~ !is.na(.)))

hbu$age <- as.numeric(hbu$age)

hbu <- hbu %>%
  mutate(across(INJU1:INJU17, ~ case_when(
    . == "I do not engage in this activity"    ~ 1,
    . == "2–3 times per month or rarely"       ~ 2,
    . == "About once a week"                   ~ 3,
    . == "2 times a week or more"              ~ 4
  )))

print(paste("Initial number of rows:", initial_rows, "| Final number of rows:", nrow(hbu)))
## [1] "Initial number of rows: 970 | Final number of rows: 839"
hbu

1.4. Variable Selection

Creation of dataframes tailored to meet the requirements of the forthcoming data analyses.

1. Dataframe for the analysis of individual items

The following table displays the first five rows of the set of items from the INJUVE instrument, used in the present study.

# Prepare the dataframe
hbu$gender <- as.factor(hbu$gender)
INJU_items <- dplyr::select(hbu, INJU1:INJU17)
pieColor <- rep("#EB9446", length(INJU_items))

# Display first 5 rows with a formatted table
head(INJU_items, 5) %>%
  kable(caption = "Table 2. Set of items from the INJUVE") %>%
  kable_styling(full_width = FALSE, position = "center")
Table 2. Set of items from the INJUVE
INJU1 INJU2 INJU3 INJU4 INJU5 INJU6 INJU7 INJU8 INJU9 INJU10 INJU11 INJU12 INJU13 INJU14 INJU15 INJU16 INJU17
2 1 3 1 1 2 1 2 4 1 1 1 4 1 2 1 3
2 2 3 3 1 1 1 2 4 2 1 3 4 4 3 2 4
3 2 4 4 1 1 1 2 4 1 1 1 4 1 4 1 4
2 3 4 2 1 2 2 2 4 1 1 4 4 1 3 1 3
2 2 3 4 2 2 3 2 4 2 2 4 3 2 3 3 3

2. Dataframe for exploring gender differences

The following table shows the first five rows of the dataset including the INJUVE item responses, the total score, and participants’ gender.

# Select CEI items, total score, and gender
INJU_gender <- dplyr::select(hbu, c(INJU1:INJU17, gender))

# Display first 5 rows with styling
head(INJU_gender, 5) %>%
  kable(caption = "Table 3. INJUVE items and gender variable") %>%
  kable_styling(full_width = FALSE, position = "center")
Table 3. INJUVE items and gender variable
INJU1 INJU2 INJU3 INJU4 INJU5 INJU6 INJU7 INJU8 INJU9 INJU10 INJU11 INJU12 INJU13 INJU14 INJU15 INJU16 INJU17 gender
2 1 3 1 1 2 1 2 4 1 1 1 4 1 2 1 3 Female
2 2 3 3 1 1 1 2 4 2 1 3 4 4 3 2 4 Female
3 2 4 4 1 1 1 2 4 1 1 1 4 1 4 1 4 Female
2 3 4 2 1 2 2 2 4 1 1 4 4 1 3 1 3 Female
2 2 3 4 2 2 3 2 4 2 2 4 3 2 3 3 3 Male

3. Dataframe for assessing model invariance across gender

The following table displays the first five rows of the dataset used for testing network invariance by gender, including the INJUVE items and participants’ gender.

# Select INJUVE items and gender for invariance analysis
INJU_invariance <- dplyr::select(hbu, INJU1:INJU17, gender)

# Display first 5 rows with a styled table
head(INJU_invariance, 5) %>%
  kable(caption = "Table 4. INJUVE items and gender variable for network invariance analysis") %>%
  kable_styling(full_width = FALSE, position = "center")
Table 4. INJUVE items and gender variable for network invariance analysis
INJU1 INJU2 INJU3 INJU4 INJU5 INJU6 INJU7 INJU8 INJU9 INJU10 INJU11 INJU12 INJU13 INJU14 INJU15 INJU16 INJU17 gender
2 1 3 1 1 2 1 2 4 1 1 1 4 1 2 1 3 Female
2 2 3 3 1 1 1 2 4 2 1 3 4 4 3 2 4 Female
3 2 4 4 1 1 1 2 4 1 1 1 4 1 4 1 4 Female
2 3 4 2 1 2 2 2 4 1 1 4 4 1 3 1 3 Female
2 2 3 4 2 2 3 2 4 2 2 4 3 2 3 3 3 Male

2. DESCRIPTIVE ANALYSIS

2.1. Sociodemographic

Participants distribution by gender and age:

# Gender distribution
gender_table <- table(hbu$gender)
gender_percent <- round(prop.table(gender_table) * 100, 2)

gender_df <- data.frame(
  Gender = names(gender_table),
  Frequency = round(as.numeric(gender_table), 2),
  Percentage = paste0(gender_percent, "%")
)

# Age summary
age_summary <- summary(hbu$age)
age_sd <- round(sd(hbu$age), 2)

age_df <- data.frame(
  Statistic = names(age_summary),
  Value = round(as.numeric(age_summary), 2)
) %>%
  bind_rows(data.frame(Statistic = "Standard Deviation", Value = age_sd))

# Display tables
kable(gender_df, caption = "Table 5. Gender distribution (frequency and %)") %>%
  kable_styling(full_width = FALSE, position = "center")
Table 5. Gender distribution (frequency and %)
Gender Frequency Percentage
Female 616 73.42%
Male 223 26.58%
kable(age_df, caption = "Table 6. Age summary statistics") %>%
  kable_styling(full_width = FALSE, position = "center")
Table 6. Age summary statistics
Statistic Value
Min. 18.00
1st Qu. 19.00
Median 20.00
Mean 20.64
3rd Qu. 22.00
Max. 26.00
Standard Deviation 2.15

2.2. Items

Univariate descriptive analysis of the items included in the INJUVE instrument:

# Descriptive statistics
descrgen <- describe(INJU_items)

# Build dataframe with selected stats
lgen <- list(c(1:17), descrgen$mean, descrgen$sd, descrgen$min,
             descrgen$max, descrgen$skew, descrgen$kurtosis)
lgen <- as.data.frame(lgen)

# Round and rename columns
lgen <- lgen %>% 
  mutate_if(is.numeric, round, digits = 3)

colnames(lgen) <- c("Item", "Mean", "SD", "Min", "Max", "Skew", "Kurtosis")

# Display table
kable(lgen, caption = "Table 7. Descriptive statistics for INJUVE items") %>%
  kable_styling(full_width = FALSE, position = "center")
Table 7. Descriptive statistics for INJUVE items
Item Mean SD Min Max Skew Kurtosis
1 2.284 0.839 1 4 0.009 -0.747
2 2.087 0.748 1 4 0.234 -0.374
3 3.374 0.702 1 4 -0.791 -0.138
4 2.633 1.248 1 4 -0.125 -1.625
5 1.315 0.715 1 4 2.425 5.222
6 1.522 0.629 1 4 0.938 0.501
7 1.678 0.589 1 4 0.507 0.912
8 1.982 0.556 1 4 0.242 1.164
9 3.795 0.570 1 4 -2.947 8.267
10 1.378 0.548 1 4 1.212 1.222
11 1.263 0.497 1 4 1.869 3.739
12 2.565 1.051 1 4 0.007 -1.217
13 3.414 0.936 1 4 -1.447 0.872
14 2.020 1.149 1 4 0.667 -1.062
15 3.653 0.659 1 4 -1.986 3.568
16 1.819 1.098 1 4 1.021 -0.439
17 3.141 0.879 1 4 -0.708 -0.392

Assessment of multivariate normality assumptions based on Mardia’s tests for skewness and kurtosis, applied to the set of items included in the INJUVE instrument:

# Run Mardia's tests
skew_result <- mardiaSkew(INJU_items)
kurt_result <- mardiaKurtosis(INJU_items)

# Round and extract values
mardia_df <- data.frame(
  Test = c("Mardia Skewness", "Mardia Kurtosis"),
  Statistic = round(c(skew_result[2], kurt_result[2]), 3),
  p_value = round(c(skew_result[4], kurt_result[3]), 3)
)
row.names(mardia_df) <- 1:2
# Show table
kable(mardia_df, caption = "Table 8: Mardia's Tests for Multivariate Normality")
Table 8: Mardia’s Tests for Multivariate Normality
Test Statistic p_value
Mardia Skewness 5456.575 0
Mardia Kurtosis 25.379 0

2.3. Items by Gender

#Setting features
pieColor <- c(rep("#EB9446", length(INJU_items))) # Da color al borde del nodo color naranja
items <- names(INJU_items)
item_labels <- c('Drinking, going out for drinks',
                       'Going to discos, dancing',
                       'Going out to meet friends',
                       'Doing sport',
                       'Attending sporting events',
                       'Going on trips',
                       'Traveling',
                       'Going to the cinema, theater, concerts, etc.',
                       'Listening to music, CDs, tapes',
                       'Going to museums and exhibitions',
                       'Attending conferences and talks',
                       'Reading books, journals, magazines',
                       'Watching television',
                       'Listening to the radio',
                       'Using the computer',
                       'Playing video games, consoles, etc.',
                       'Resting, doing nothing')

# Crear tabla de resultados
results <- lapply(seq_along(items), function(i) {
  item <- items[i]
  label <- item_labels[i]
  
  # Obtener estadísticas por grupo
  stats <- INJU_gender %>%
    group_by(gender) %>%
    summarise(
      mean = round(mean(.data[[item]], na.rm = TRUE), 3),
      sd = round(sd(.data[[item]], na.rm = TRUE), 3)
    )
  
  male_val <- paste0(stats$mean[stats$gender == "Male"], " (", stats$sd[stats$gender == "Male"], ")")
  female_val <- paste0(stats$mean[stats$gender == "Female"], " (", stats$sd[stats$gender == "Female"], ")")
  
  # t-test
  t_result <- t.test(as.formula(paste(item, "~ gender")), data = INJU_gender)
  t_stat <- round(t_result$statistic, 3)
  df <- round(t_result$parameter, 3)
  p_val <- round(t_result$p.value,3)
    
  # Efecto (Cohen's d)
  d_val <- cohens_d(as.formula(paste(item, "~ gender")), data = INJU_gender)$Cohens_d
  d_val <- round(d_val, 3)
  
  # Combinar resultados
  data.frame(
    Item = label,
    Male = male_val,
    Female = female_val,
    T_statistic = paste0(t_stat, " (", df, ")"),
    p_value = p_val,
    Effect_size = d_val
  )
})

# Unir todos los ítems
final_table <- bind_rows(results)

# Mostrar tabla
kable(final_table, caption = "Table 9. Mean (SD) by Gender, t-test Results, p-value and Effect Sizes for INJUVE Items")
Table 9. Mean (SD) by Gender, t-test Results, p-value and Effect Sizes for INJUVE Items
Item Male Female T_statistic p_value Effect_size
Drinking, going out for drinks 2.283 (0.852) 2.284 (0.835) 0.024 (386.5) 0.981 0.002
Going to discos, dancing 1.996 (0.774) 2.12 (0.736) 2.087 (376.545) 0.038 0.167
Going out to meet friends 3.439 (0.7) 3.351 (0.701) -1.622 (393.505) 0.106 -0.127
Doing sport 3.283 (1.021) 2.398 (1.24) -10.452 (473.845) 0.000 -0.746
Attending sporting events 1.798 (1.013) 1.14 (0.46) -9.364 (255.812) 0.000 -1.007
Going on trips 1.646 (0.668) 1.477 (0.608) -3.303 (363.613) 0.001 -0.270
Traveling 1.605 (0.559) 1.705 (0.598) 2.228 (418.685) 0.026 0.169
Going to the cinema, theater, concerts, etc. 1.955 (0.613) 1.992 (0.533) 0.792 (350.643) 0.429 0.066
Listening to music, CDs, tapes 3.816 (0.543) 3.787 (0.58) -0.666 (417.769) 0.506 -0.050
Going to museums and exhibitions 1.408 (0.561) 1.367 (0.543) -0.948 (382.032) 0.344 -0.075
Attending conferences and talks 1.287 (0.501) 1.255 (0.495) -0.823 (389.698) 0.411 -0.065
Reading books, journals, magazines 2.475 (1.073) 2.597 (1.042) 1.467 (383.416) 0.143 0.116
Watching television 3.283 (1.012) 3.461 (0.903) 2.322 (357.673) 0.021 0.191
Listening to the radio 2.058 (1.175) 2.006 (1.141) -0.569 (383.31) 0.570 -0.045
Using the computer 3.628 (0.704) 3.662 (0.642) 0.642 (363.942) 0.521 0.052
Playing video games, consoles, etc. 2.691 (1.162) 1.503 (0.883) -13.878 (319.613) 0.000 -1.230
Resting, doing nothing 3.13 (0.961) 3.144 (0.848) 0.198 (354.454) 0.843 0.016

3. NETWORK ANALYSIS

3.1. Community Identification

A bootstrap exploratory graph analysis (EGA) was performed to estimate the number, structure, and stability of variable communities (i.e., network dimensions) based on their structural consistency. In this context, a community refers to a cluster of nodes that exhibit strong interconnections through edges.

3.1.1 Community Estimation

The analysis facilitates the identification and delineation of such communities, as well as the allocation of individual items to their corresponding dimensions.

set.seed(123)
communities <- bootEGA(INJU_items,
                       model = "glasso",
                       type = "resampling",
                       iter = 1000,
                       typicalStructure = TRUE,
                       plot.typicalStructure = TRUE)
Figure 1. Community Estimation

Figure 1. Community Estimation

3.1.2. Community Stability

Items were retained within their assigned communities only when their stability coefficients exceeded 0.70. Items falling below this threshold were excluded due to their potential to undermine the structural integrity of the corresponding dimension.

group_list <- list('Festive leisure'=c(1:3), 'Sports–competitive leisure'=c(4,5,16,17), 'Cultural leisure'=c(6,7,8,10,11,12), 'Passive leisure'=c(9,13,14,15))

# Structural consistency
communities.dimstab <- dimensionStability(communities)
Figure 2. Community Stability

Figure 2. Community Stability

Regarding the communities stability it was:

communities.dimstab$dimension.stability
## $structural.consistency
##     1     2     3     4 
## 1.000 0.849 0.340 0.604 
## 
## $average.item.stability
##         1         2         3         4 
## 1.0000000 0.9600000 0.8728571 0.7000000

Regarding the items stability it was:

communities.dimstab$item.stability
## EGA Type: EGA 
## Bootstrap Samples: 1000 (Resampling)
## 
## Proportion Replicated in Dimensions:
## 
##  INJU1  INJU2  INJU3  INJU4  INJU5  INJU6  INJU7  INJU8  INJU9 INJU10 INJU11 
##  1.000  1.000  1.000  0.997  0.998  0.997  0.999  0.970  0.677  0.998  0.996 
## INJU12 INJU13 INJU14 INJU15 INJU16 INJU17 
##  0.764  0.684  0.386  0.739  0.994  0.851

3.2. Gaussian Graphical Models (GGMs)

3.2.1. Network Estimation

INJU_poly <- polychoric(INJU_items)

# Graficar red
suppressMessages(suppressWarnings({
INJU_glasso <- qgraph::qgraph(
  INJU_poly$rho,
  graph        = "glasso",
  layout       = "spring",
  groups       = group_list,
  palette      = "ggplot2",
  nodeNames    = item_labels,
  labels       = TRUE,
  threshold    = TRUE,
  sampleSize   = nrow(INJU_items),
  vsize        = 5,
  esize        = 5,
  asize        = 6,
  border.width = 1,
  border.color = "black",
  unCol        = "black",
  theme        = "colorblind",
  legend       = TRUE,          # <‑‑ leyenda automática
  legend.cex   = 0.8,           # tamaño texto de la leyenda
  layoutScale  = c(0.8, 0.8),   # encoge ligeramente la red
  rescale      = TRUE
)
}))
Figure 3. Gaussian Graphical Model Estimation

Figure 3. Gaussian Graphical Model Estimation

possible_edges <- (ncol(INJU_items)*(ncol(INJU_items)-1))/2
real_edges <- length(INJU_glasso$Edgelist$weight)
density <- round(real_edges/possible_edges*100, digits = 2)

# Print explanation and result
cat("Number of potential edges:", possible_edges, "\n")
## Number of potential edges: 136
cat("Number of real edges:", real_edges, "\n")
## Number of real edges: 28
cat("Network density (percentage of actual edges out of all possible undirected edges):", density, "%\n")
## Network density (percentage of actual edges out of all possible undirected edges): 20.59 %
# Extract edge weights
edge_weights <- INJU_glasso$Edgelist$weight

# Calculate summary statistics
mean_weight <- round(mean(edge_weights), digits = 3)
sd_weight <- round(sd(edge_weights), digits = 3)
max_weight <- round(max(edge_weights), digits = 3)
min_weight <- round(min(edge_weights), digits = 3)

# Print explanation and results clearly
cat("Standard Edge Weight \n")
## Standard Edge Weight
cat("The average edge weight was:", mean_weight, "(SD:", sd_weight, ")\n")
## The average edge weight was: 0.184 (SD: 0.189 )
cat("Minimum weight:", min_weight, "| Maximum weight:", max_weight, "\n")
## Minimum weight: -0.217 | Maximum weight: 0.611
# Calculate summary statistics
mean_weight <- round(mean(abs(edge_weights)), digits = 3)
sd_weight <- round(sd(abs(edge_weights)), digits = 3)
max_weight <- round(max(abs(edge_weights)), digits = 3)
min_weight <- round(min(abs(edge_weights)), digits = 3)

# Print explanation and results clearly
cat("\n")
cat("Absolute Values \n")
## Absolute Values
cat("The average edge weight was:", mean_weight, "(SD:", sd_weight, ")\n")
## The average edge weight was: 0.233 (SD: 0.12 )
cat("Minimum weight:", min_weight, "| Maximum weight:", max_weight, "\n")
## Minimum weight: 0.101 | Maximum weight: 0.611

3.2.2. Network characterization

INJU_matrix <- as.matrix(INJU_items)
invisible(capture.output({SV_mgm <-mgm(
    data = INJU_matrix,
    type = rep("g", length(INJU_items)),
    level = rep(1, length(INJU_items)),
    k = 2,
    verbatim = TRUE,
    warnings = TRUE
  )
}))

#Predice la varianza explicada de cada nodo y otros parametros
SV_predic <- predict(SV_mgm, INJU_items, error.continuous = "VarExpl")

suppressMessages(suppressWarnings({
INJU_glasso_predicted <- qgraph::qgraph(
  cor_auto(INJU_items),
  graph        = "glasso",
  layout       = "spring",
  groups       = group_list,
  palette      = "ggplot2",
  nodeNames    = item_labels,
  labels       = TRUE,
  threshold    = TRUE,
  sampleSize   = nrow(INJU_items),
  pie          = SV_predic$errors$R2,
  pieColor     = pieColor,
  vsize        = 5,
  esize        = 5,
  asize        = 6,
  border.width = 0.9,
  border.color = "black",
  unCol        = "black",
  theme        = "colorblind",
  legend       = TRUE,          # <‑‑ leyenda automática
  legend.cex   = 0.8,           # tamaño texto de la leyenda
  layoutScale  = c(0.8, 0.8),   # encoge ligeramente la red
  rescale      = TRUE
)
}))
Figure 4. Gaussian Graphical Model Characterization

Figure 4. Gaussian Graphical Model Characterization

3.3. Centrality Measures

3.3.1. Centality Estimation

centrality_INJU <- centralityPlot(INJU_glasso, include = 
                                  c("Betweenness","Closeness","Strength","ExpectedInfluence"),
                                orderBy ="Betweenness", scale = "z-scores")
Figure 5. Centrality Measures Estimation

Figure 5. Centrality Measures Estimation

3.3.2. Centrality Stability

set.seed(123)
# 1. Estimar la red con EBICglasso
glasso_net <- estimateNetwork(INJU_items, 
                              default = "EBICglasso", 
                              corMethod = "cor_auto",
                              verbose = FALSE)

set.seed(123)
# 2. Bootstrap de estabilidad (case-dropping)
boot_results <- bootnet(glasso_net,
                        nBoots = 1000,
                        nCores = 12,
                        type = "case",
                        statistics = "all",
                        verbose = FALSE)

# 3. Plot de estabilidad de centralidad (bootstrap case-dropping)
plot(boot_results,
     statistics = c("strength", "closeness", "betweenness","expectedInfluence"),
     labels = TRUE,
     order = "sample",
     legend = TRUE,
     color = "darkblue",
     line = TRUE,
     main = "Bootstrap Centrality Stability (Case Dropping)")
Figure 6. Centrality Measures Stability

Figure 6. Centrality Measures Stability

# 4. Coeficientes de estabilidad de centralidad
cs_coeffs <- corStability(boot_results, verbose = FALSE)

# Crear tabla
cs_table <- data.frame(
  Centrality = c("Betweenness", "Closeness", "Strength", "ExpectedInfluence"),
  CS_Coefficient = round(c(cs_coeffs[1], cs_coeffs[2], cs_coeffs[10], cs_coeffs[6]), 3)
)
row.names(cs_table) <- 1:4

# Mostrar como tabla en R Markdown
knitr::kable(cs_table, caption = "Table 10: Centrality Stability Coefficients (CS)")
Table 10: Centrality Stability Coefficients (CS)
Centrality CS_Coefficient
Betweenness 0.205
Closeness 0.284
Strength 0.672
ExpectedInfluence 0.750

4. NETWORK COMPARISON

4.1. General

INJU_male <- INJU_invariance[INJU_invariance$gender == "Male",-length(INJU_invariance)]
INJU_female <- INJU_invariance[INJU_invariance$gender == "Female",-length(INJU_invariance)]

set.seed(12)
EN_male = estimateNetwork(INJU_male, default = "EBICglasso", corMethod = "cor_auto")
EN_female = estimateNetwork(INJU_female, default = "EBICglasso", corMethod = "cor_auto")
set.seed(123)
invisible(capture.output({
  res <- NCT(
    INJU_female,
    INJU_male,
    p.adjust.methods = "BH",
    binary.data = FALSE,
    test.edges = TRUE,
    edges = "all",
    it = 1000
  )
}))

female_str <- round(res$glstrinv.sep[1], 3)
male_str <- round(res$glstrinv.sep[2], 3)
p_str <- ifelse(res$glstrinv.pval < 0.001, "p < 0.001", paste0("p = ", round(res$glstrinv.pval, 3)))

dif_net <- round(res$nwinv.real, 3)
p_net <- ifelse(res$nwinv.pval < 0.001, "p < 0.001", paste0("p = ", round(res$nwinv.pval, 3)))

p_edges <- paste(
  dplyr::filter(res$einv.pvals, `p-value` < 0.05) %>%
    dplyr::mutate(pair = paste(Var1, "-", Var2)) %>%
    dplyr::pull(pair),
  collapse = ", "
)

In terms of global strength, the female network showed a lower overall connectivity (S = 2.127) compared to the male network (S = 1.426). This difference was not statistically significant (p = 0.674).

plot(res, what="strength")
Figure 7. P-values for differences in network strength between genders

Figure 7. P-values for differences in network strength between genders

The network structure invariance test revealed a difference of 0.119 between the female and male networks. However, this difference was not statistically significant (p = 0.879), suggesting that the overall structure of the two networks can be considered similar.

plot(res, what="network")
Figure 8. P-values for differences in network structure between genders

Figure 8. P-values for differences in network structure between genders

4.2. Networks by Gender

Female’s Network

# --- FEMALE GROUP ---

# Convert data to matrix
INJU_female_matrix <- as.matrix(INJU_female)

# Estimate MGM model (defines number of variables, order of interactions, and input data)
invisible(capture.output({
  INJU_mgm_female <- mgm(
    data = INJU_female_matrix,
  type = rep("g", length(INJU_female)),
  level = rep(1, length(INJU_female)),
  k = 2,
  warnings = TRUE,
  verbatim = TRUE
  )
}))

# Predict explained variance (R²) and other node-level metrics
INJU_predic_female <- predict(
  INJU_mgm_female,
  INJU_female,
  error.continuous = "VarExpl"
)

# Plot the network with R² represented as pie charts on nodes
suppressMessages(suppressWarnings({
INJU_glasso_predicted_female <- plot(
  EN_female,
  layout = INJU_glasso_predicted$layout,
  groups       = group_list,
  palette      = "ggplot2",
  nodeNames    = item_labels,
  labels       = TRUE,
  threshold    = FALSE,
  sampleSize   = nrow(INJU_female),
  pie          = INJU_predic_female$errors$R2,
  pieColor     = pieColor,
  vsize        = 5,
  esize        = 5,
  asize        = 6,
  border.width = 0.9,
  border.color = "black",
  unCol        = "black",
  theme        = "colorblind",
  legend       = TRUE,          # <‑‑ leyenda automática
  legend.cex   = 0.8,           # tamaño texto de la leyenda
  layoutScale  = c(0.8, 0.8),   # encoge ligeramente la red
  rescale      = TRUE
)
}))
Figure 9. Networks of Female Participants

Figure 9. Networks of Female Participants

Male’s network

# --- MALE GROUP ---

# Convert data to matrix
INJU_male_matrix <- as.matrix(INJU_male)

# Estimate MGM model (defines number of variables, order of interactions, and input data)
invisible(capture.output({
  INJU_mgm_male <- mgm(
    data = INJU_male_matrix,
    type = rep("g", length(INJU_male)),
    level = rep(1, length(INJU_male)),
    k = 2,
    warnings = TRUE,
    verbatim = TRUE
  )
}))

# Predict explained variance (R²) and other node-level metrics
INJU_predic_male <- predict(
  INJU_mgm_male,
  INJU_male,
  error.continuous = "VarExpl"
)

# Plot the network with R² represented as pie charts on nodes
suppressMessages(suppressWarnings({
INJU_glasso_predicted_male <- plot(
  EN_male,
  layout = INJU_glasso_predicted$layout,
  groups       = group_list,
  palette      = "ggplot2",
  nodeNames    = item_labels,
  labels       = TRUE,
  threshold    = FALSE,
  sampleSize   = nrow(INJU_male),
  pie          = INJU_predic_male$errors$R2,
  pieColor     = pieColor,
  vsize        = 5,
  esize        = 5,
  asize        = 6,
  border.width = 0.9,
  border.color = "black",
  unCol        = "black",
  theme        = "colorblind",
  legend       = TRUE,          # <‑‑ leyenda automática
  legend.cex   = 0.8,           # tamaño texto de la leyenda
  layoutScale  = c(0.8, 0.8),   # encoge ligeramente la red
  rescale      = TRUE
)
}))
Figure 10. Networks of Male Participants

Figure 10. Networks of Male Participants

4.3. Centrality Measures

To evaluate the invariance of the centrality measures network across genders.

# Compare centralities between networks
network_comparison <- compareCentrality(
  INJU_glasso_predicted_female,
  INJU_glasso_predicted_male,
  include = "all",
  legendName = "Centrality Measures by Gender",
  net1Name = "Female",
  net2Name = "Male"
)
plot(network_comparison)
Figure 11. Centrality comparison between female and male networks

Figure 11. Centrality comparison between female and male networks

The Network Comparison Test identified some statistically significant differences in edge strength between the female and male networks after applying the Benjamini–Hochberg procedure to correct p-values for multiple comparisons. The significance level was set at the conventional threshold of 0.05 (see Table 11). However, these results should be interpreted with caution, as the overall network structure and edge weights showed no general differences. Thus, the observed variations may not be practically relevant.

set.seed(123)
invisible(capture.output({
  nct_test_centrality <- NCT(
    INJU_female,
    INJU_male,
    p.adjust.methods = "BH",
    binary.data = FALSE,
    test.centrality = TRUE,
    edges = "all",
    centrality = c("closeness", "betweenness", "strength", "expectedInfluence"),
    it = 1000
  )
}))

# Extract p-values and convert to data frame
centrality_pvals <- as.data.frame(nct_test_centrality$diffcen.pval)

# Round to 3 decimals (and adjust "< 0.001" if you want a more elegant output)
centrality_pvals_rounded <- round(centrality_pvals, 3)

# Optional: add row names as a column
centrality_pvals_rounded$Node <- rownames(centrality_pvals_rounded)
centrality_pvals_rounded <- centrality_pvals_rounded[, c("Node", setdiff(names(centrality_pvals_rounded), "Node"))]
row.names(centrality_pvals_rounded) <- 1:17

# Display as table
kable(centrality_pvals_rounded, caption = "Table 11. P-values for Centrality Differences (NCT Test)")
Table 11. P-values for Centrality Differences (NCT Test)
Node closeness betweenness strength expectedInfluence
INJU1 NA 0.051 0.911 0.898
INJU2 NA 0.391 0.911 0.911
INJU3 NA 1.000 0.911 0.911
INJU4 NA 0.911 0.967 0.967
INJU5 NA 0.999 0.996 0.996
INJU6 NA 1.000 0.967 0.967
INJU7 NA 0.967 0.967 0.967
INJU8 NA 1.000 0.911 0.911
INJU9 NA 0.025 0.911 0.911
INJU10 NA 0.898 0.911 0.911
INJU11 NA 0.911 0.967 0.967
INJU12 NA 0.025 0.967 0.967
INJU13 NA 0.214 0.967 0.967
INJU14 NA 0.911 0.967 0.967
INJU15 NA 0.140 0.927 0.927
INJU16 NA 0.911 0.967 0.967
INJU17 NA 1.000 0.911 0.911

5. BAYESIAN NETWORK

The construction of the Bayesian network (BN) is carried out in two steps. Firstly, the estimation of the directed acyclic graph (DAG) is performed, and secondly, the BN model is fitted and validated with the study dataset.

5.1. DAG Estimation

For the DAG estimation phase, the PC Stable algorithm with no restrictions was employed. To ensure stability in the obtained DAG, a total of 200 bootstrap samples were drawn, and only the edges with a strength greater than 0.85 and a direction greater than 0.5 were retained in the final model.

set.seed(1234)
bootstr <- boot.strength(INJU_items, R = 200, algorithm = "pc.stable")

avgnet <- averaged.network(bootstr, threshold = 0.85)

sp <- strength.plot(avgnet, bootstr, shape = "ellipse", render = FALSE)

INJU_DAG_qgraph <- qgraph::qgraph(
  sp, layout = "spring",
  groups       = group_list,
  palette      = "ggplot2",
  nodeNames    = item_labels,
  labels       = TRUE,
  threshold    = FALSE,
  sampleSize   = nrow(INJU_items),
  vsize        = 4.5,
  esize        = 3,
  asize        = 3,
  border.width = 0.9,
  border.color = "black",
  unCol        = "black",
  theme        = "colorblind",
  legend       = TRUE,          # <‑‑ leyenda automática
  legend.cex   = 0.8,           # tamaño texto de la leyenda
  layoutScale  = c(0.8, 0.8),   # encoge ligeramente la red
  rescale      = TRUE
)
Figure 12. Directed Acyclic Graph (DAG)

Figure 12. Directed Acyclic Graph (DAG)

5.2. Bayesian Network Validation

To fit and validate the BN model in the second phase of the process, the dataset was subdivided into 10 folds. A routine was implemented in which 90% of the folds were used to train the model, and the remaining 10% were used for testing. This cross-validation routine was repeated until all potential combinations were explore.

pieColor <- c(rep("#EB9446", length(INJU_items))) # Da color al borde del nodo color naranja                        
#Cross-validation (Check spatial cross-validation: blockCV)
set.seed(1234)
# Split data in 5 sets
kf <- dismo::kfold(nrow(INJU_items), k = 10) # k-fold partitioning
models_fit <- c()

for (var in names(INJU_items)) {
  kfold_fit <- c()
  aux_base <- data.frame('variable'=var)
  if(is.numeric(INJU_items[[var]])){
    for(k in 1:10) {
      test <- INJU_items[kf == k, ]
      train <- INJU_items[kf != k, ]
      
      training <- bn.fit(avgnet,train)
      pred <- predict(training, node = var, data = test)
      
      z <- data.frame( R2 = R2(pred, test[[var]]),
                       RMSE = RMSE(pred, test[[var]]),
                       MAE = MAE(pred, test[[var]]))
      if(is.na(z$R2)){
        z[is.na(z$R2),] <- 0
      }
      kfold_fit <- rbind(kfold_fit, z) 
    }
    
    z <- apply(kfold_fit, 2, mean)
    z <- cbind(aux_base,as.data.frame(t(z)))
    models_fit <- rbind(models_fit, z)
  }else{
    for(k in 1:10) {
      # Split data in test and train
      test <- INJU_items[kf == k, ]
      train <- INJU_items[kf != k, ]
      
      training <- bn.fit(avgnet,train)
      pred <- predict(training, node = var, data = test)
      
      validation <- confusionMatrix(as.factor(pred),test[[var]])
      z1 <- as.data.frame(t(validation$overall))
      
      if(length(levels(INJU_items[[var]])) > 2){
        z2 <- as.data.frame(t(sapply(as.data.frame(validation$byClass),mean))) 
      }else{
        z2 <- as.data.frame(t(sapply(validation$byClass,mean))) 
      }
      z <- cbind(z1, z2)
      
      kfold_fit <- rbind(kfold_fit, z)
    }
    z <- apply(kfold_fit, 2, mean)
    z <- cbind(aux_base,z)
    models_fit <- rbind(models_fit, z)
  }
}

INJU_BN_qgraph <- qgraph::qgraph(
  sp, layout = "spring",
  groups       = group_list,
  palette      = "ggplot2",
  nodeNames    = item_labels,
  pieColor = pieColor,
  pie = models_fit$R2,
  labels       = TRUE,
  threshold    = FALSE,
  sampleSize   = nrow(INJU_items),
  vsize        = 4.5,
  esize        = 3,
  asize        = 3,
  border.width = 0.9,
  border.color = "black",
  unCol        = "black",
  theme        = "colorblind",
  legend       = TRUE,          # <‑‑ leyenda automática
  legend.cex   = 0.8,           # tamaño texto de la leyenda
  layoutScale  = c(0.8, 0.8),   # encoge ligeramente la red
  rescale      = TRUE
)
Figure 13. Bayesian Network (BN) Model

Figure 13. Bayesian Network (BN) Model

6. PREDICTAIBILITY CAPACITY

The explained variance (R²) and prediction error (RMSE) for each item were estimated using partial-correlation networks (GGMs) separately for the full sample, females, and males (see Table 12). Across all three GGM models, Item 1 consistently showed the highest explained variance and lowest prediction error, indicating it is the most predictable item based on the rest of the network. In contrast, Item 9 had the lowest explained variance and the highest RMSE, suggesting it is the least predictable.

A notable gender difference was also observed in Item 3, with a discrepancy of 0.059 in R² between females and males, indicating higher predictability for the male group on this item.

Regarding the Bayesian Network (BN) model, Item 7 showed the highest explained variance, suggesting it is most influenced by other nodes in the network. Item 4 presented the lowest RMSE, representing the most accurate prediction in terms of error. Conversely, Item 16 displayed the poorest performance in explained variance, while Item 12 had the highest RMSE, indicating the largest prediction error in the BN model (see Table 12).

# Item labels
items <- paste("Item", 1:17)

# GGM - General
ggm_general <- SV_predic$errors %>%
  transmute(
    Variable = items,
    R2_gen = round(R2, 3),
    RMSE_gen = round(RMSE, 3)
  )

# GGM - Female
ggm_female <- INJU_predic_female$errors %>%
  transmute(
    R2_fem = round(R2, 3),
    RMSE_fem = round(RMSE, 3)
  )

# GGM - Male
ggm_male <- INJU_predic_male$errors %>%
  transmute(
    R2_male = round(R2, 3),
    RMSE_male = round(RMSE, 3)
  )

# BN model
bn <- models_fit %>%
  transmute(
    R2_bn = ifelse(R2 == 0, "-", round(R2, 3)),
    RMSE_bn = ifelse(RMSE == 0, "-", round(RMSE, 3))
  )

# Combine all safely
final_table <- bind_cols(ggm_general, ggm_female, ggm_male, bn)

# Set final column names to just "R²" and "RMSE" per section
names(final_table) <- c(
  "Variable",
  rep(c("R²", "RMSE"), times = 4)  # General, Female, Male, BN
)

# Render with multi-level header
final_table %>%
  kable(align = "lcccccccc", booktabs = TRUE,
        caption = "Table 12. Explained variance and RMSE of partial-correlation network (GGM) and BN models") %>%
  add_header_above(c(" " = 1, 
                     "General" = 2, 
                     "Female" = 2, 
                     "Male" = 2, 
                     " " = 2)) %>%
    add_header_above(c(" " = 1, 
                     "GGMs" = 6, 
                     "BN" = 2)) %>%
  kable_styling(full_width = FALSE, position = "center")
Table 12. Explained variance and RMSE of partial-correlation network (GGM) and BN models
GGMs
BN
General
Female
Male
Variable RMSE RMSE RMSE RMSE
Item 1 0.466 0.730 0.479 0.721 0.445 0.743 0.205 0.752
Item 2 0.432 0.753 0.445 0.745 0.423 0.758 0.429 0.57
Item 3 0.227 0.879 0.215 0.885 0.274 0.850
Item 4 0.131 0.931 0.081 0.958 0.112 0.940
Item 5 0.216 0.885 0.088 0.954 0.259 0.859 0.135 0.666
Item 6 0.256 0.862 0.214 0.886 0.363 0.796 0.196 0.561
Item 7 0.193 0.898 0.181 0.904 0.282 0.845 0.121 0.556
Item 8 0.112 0.942 0.118 0.939 0.131 0.930 0.07 0.537
Item 9 0.059 0.969 0.066 0.966 0.000 0.998
Item 10 0.285 0.845 0.269 0.854 0.345 0.808
Item 11 0.182 0.904 0.154 0.919 0.300 0.835 0.176 0.455
Item 12 0.146 0.923 0.129 0.933 0.212 0.886 0.107 1.004
Item 13 0.102 0.947 0.077 0.960 0.175 0.906 0.067 0.907
Item 14 0.098 0.949 0.079 0.959 0.235 0.873
Item 15 0.084 0.957 0.074 0.962 0.124 0.934 0.038 0.65
Item 16 0.161 0.916 0.080 0.959 0.062 0.967 0.12 1.029
Item 17 0.079 0.959 0.082 0.957 0.109 0.942